Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return EOF and ErrUnexpectedEOF correctly #64

Merged
merged 16 commits into from
Mar 17, 2022

Conversation

MarcoPolo
Copy link
Contributor

Context

Fixes #62. See that issue for context.

Proposed solution

At first I considered trying to find all the spots after our first read and return ErrUnexpectedEOF, but that was error prone.

Instead I created a new wrapper type that implements io.Reader but keeps context on whether this is the first read or not. If it is the first read it returns what the inner reader returns. Otherwise, if it isn't the first read and the inner reader returns EOF, this reader knows that we ran into an EOF in the context of expecting to read more data from this inner reader, so the wrapper reader returns ErrUnexpectedEOF instead of EOF.

This does add overhead, so I'm curious to see if we notice it.

r? @Stebalien

@Stebalien
Copy link
Collaborator

I pushed a small fix to make link scanning a bit faster, but it's still pretty terrible.

Also, I think we need to update "has read once" in all the "read a byte" methods, which will make performance worse.

You can test performance with go test -bench . in the testing directory.

@MarcoPolo
Copy link
Contributor Author

The ScanForLinks method is pretty small so we could avoid allocation there like in my latest commit.

@Stebalien
Copy link
Collaborator

Yeah, the performance seems reasonable now. It's slightly slower, but not by much.

Also regenerate tests, that's how I found this.
@Stebalien
Copy link
Collaborator

Hm. Actually, it's about 2x slower on decode and has 3x the number of allocations. I needed to regenerate the tests to find that.

@Stebalien
Copy link
Collaborator

So, I've done what I can, but it's still much slower. Remaining things are:

  • GetPeeker has lost some of it's magic because it can't see through the new reader type.
  • I'm not sure where the new unmarshal allocations are coming from.

I recommend testing with:

go test -bench . -benchmem -memprofile=mem.out -cpuprofile=cpu.out

Then look at the allocation counts and cpu profile with pprof.

@MarcoPolo
Copy link
Contributor Author

oh I see my mistake, I didn't realize I needed to regen the _gen.go files

@MarcoPolo
Copy link
Contributor Author

https://gist.github.com/MarcoPolo/b72fdd0e43e04d0721a5056b4f579190

Now we're at fewer allocs than before after the GetPeeker change, but still 8 more than before.

@MarcoPolo
Copy link
Contributor Author

ah I see what's happening.

We pass in the reader from GetPeeker to the nested calls to UnmarshalCbor. And that isn't a ReaderWithEOFContext so we allocate a new one. (Also I think my last commit subtlety breaks this since it essentially unwraps the Reader from ReaderWithEOFContext and dropping the context).

I wonder if doing a defer at the top level would also solve this without requiring us to wrap the reader. I'll try this tomorrow.

Thanks for the tips :)

@MarcoPolo
Copy link
Contributor Author

Alright I'm using defer now to remove the extra allocation from the wrapped reader. My understanding of simpler defer func statements is that they are inlined by the compiler (please correct me if I'm wrong).

I updated the benchmark results here: https://gist.github.com/MarcoPolo/b72fdd0e43e04d0721a5056b4f579190

So no extra allocation compared to master, but a smidge slower.

Also I think this needs a more careful review since it now relies on us doing this at the first read in Unmarshal/scanforlinks.

Copy link
Collaborator

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems quite a bit cleaner.

gen.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Collaborator

  • Also fixed Deferred to handle EOF.

From my benchmarks, any perf difference is noise. But we should get a third-party review from someone not involved.

@MarcoPolo
Copy link
Contributor Author

Let me add a test case here too

@MarcoPolo
Copy link
Contributor Author

(test currently fails, I'll investigate in a bit)

@Stebalien
Copy link
Collaborator

Ah... we're doing error wrapping internally.

@MarcoPolo
Copy link
Contributor Author

yup, I switched to using errors.Is. Which does make me a bit uneasy since it needs to traverse the error chain to see if an error matches.

The benchmarks don't show much though: https://gist.github.com/MarcoPolo/b72fdd0e43e04d0721a5056b4f579190

gen.go Outdated
if err == io.EOF {
err = io.ErrUnexpectedEOF
if errors.Is(err, io.EOF) {
err = xerrors.Errorf("%w: %v", io.ErrUnexpectedEOF, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't quite work. %w needs to be last.

It's probably fine to just leave it as-is? The only thing we really care about is not returning an EOF if we've read bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? https://go.dev/play/p/k2wOyoCk4WO makes it seem like both work.

I figured we cared enough to wrap the original error, so might be nice to keep that info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, being last only matters for Unwrap. (docs: https://pkg.go.dev/golang.org/x/xerrors#Errorf)

I figured we cared enough to wrap the original error, so might be nice to keep that info.

My point was: that we could just leave everything as-is. I.e., we don't want an EOF unless we read no data, but we're fine with any other error (including a wrapped EOF) otherwise.

But if this works and doesn't hurt performance too much, then it's fine by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, being last only matters for Unwrap. (docs: https://pkg.go.dev/golang.org/x/xerrors#Errorf)

From that link

If the format specifier includes a %w verb with an error operand in a position other than at the end, the returned error will still implement an Unwrap method returning the operand, but the error's Format method will not return the wrapped error.

What is the "error's Format method" ?

anyways, I understand what you mean by as-is. I changed this to return the wrapped error and only replace if we return EOF. I also changed the test to only check that we aren't returning EOF exactly if it could read some bytes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "error's Format method" ?

🤷‍♂️ I'm having a lot of difficulty with those docs.

@MarcoPolo
Copy link
Contributor Author

Friendly bump on this @Stebalien :)

Copy link
Collaborator

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Stebalien Stebalien merged commit 87edca1 into whyrusleeping:master Mar 17, 2022
@MarcoPolo MarcoPolo deleted the MarcoPolo/issue62 branch March 21, 2022 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only return EOF when we've read no bytes
2 participants